Reducing the Overfitting of Adaboost by Controlling its Data Distribution Skewness

نویسندگان

  • Yijun Sun
  • Sinisa Todorovic
  • Jian Li
چکیده

AdaBoost rarely suffers from overfitting problems in low noise data cases. However, recent studies with highly noisy patterns have clearly shown that overfitting can occur. A natural strategy to alleviate the problem is to penalize the data distribution skewness in the learning process to prevent several hardest examples from spoiling decision boundaries. In this paper, we pursue such a penalty scheme in the mathematical programming setting, which allows us to define a suitable classifier soft margin. By using two smooth convex penalty functions, based on Kullback–Leibler divergence (KL) and l2 norm, we derive two new regularized AdaBoost algorithms, referred to as AdaBoostKL and AdaBoostNorm2, respectively. We prove that our algorithms perform stage-wise gradient descent on a cost function, defined in the domain of their associated soft margins. We demonstrate the effectiveness of the proposed algorithms through experiments over a wide variety of data sets. Compared with other regularized AdaBoost algorithms, our methods achieve at least the same or better performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Increasing the Robustness of Boosting Algorithms within the Linear-programming Framework

AdaBoost has been successfully used in many signal classification systems. However, it has been observed that on highly noisy data AdaBoost easily leads to overfitting, which seriously constrains its applicability. In this paper, we address this problem by proposing a new regularized boosting algorithm LPnorm2-AdaBoost (LPNA). This algorithm arises from a close connection between AdaBoost and l...

متن کامل

A Fast Scheme for Feature Subset Selection to Avoid Overfitting in AdaBoost

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We show that with the introduction of a scoring function and the random selection of training data it is possible to create a smaller set of feature vectors. The selection of th...

متن کامل

Exp-Kumaraswamy Distributions: Some Properties and Applications

In this paper, we propose and study exp-kumaraswamy distribution. Some of its properties  are derived, including the density function, hazard rate function, quantile function, moments,  skewness  and kurtosis.   Adata set isused to illustrate an application of the proposed distribution. Also, we obtain a new distribution by transformation onexp-kumaraswamy distribution.   New distribution is an...

متن کامل

Using Validation Sets to Avoid Overfitting in AdaBoost

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed ...

متن کامل

AR-Boost: Reducing Overfitting by a Robust Data-Driven Regularization Strategy

We introduce a novel, robust data-driven regularization strategy called Adaptive Regularized Boosting (AR-Boost), motivated by a desire to reduce overfitting. We replace AdaBoost’s hard margin with a regularized soft margin that trades-off between a larger margin, at the expense of misclassification errors. Minimizing this regularized exponential loss results in a boosting algorithm that relaxe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJPRAI

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2006